Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
In this paper, we propose a novel graph kernel, namely the Quantum-based Entropic Subtree Kernel (QESK), for Graph Classification. To this end, we commence by computing the Average Mixing Matrix (AMM) of the Continuous-time Quantum Walk (CTQW) evolved on each graph structure. Moreover, we show how this AMM matrix can be employed to compute a series of entropic subtree representations associated with the classical Weisfeiler-Lehman (WL) algorithm. For a pair of graphs, the QESK kernel is defined by computing the exponentiation of the negative Euclidean distance between their entropic subtree representations, theoretically resulting in a positive definite graph kernel. We show that the proposed QESK kernel not only encapsulates complicated intrinsic quantum-based structural characteristics of graph structures through the CTQW, but also theoretically addresses the shortcoming of ignoring the effects of unshared substructures arising in state-of-the-art R-convolution graph kernels. Moreover, unlike the classical R-convolution kernels, the proposed QESK can discriminate the distinctions of isomorphic subtrees in terms of the global graph structures, theoretically explaining the effectiveness. Experiments indicate that the proposed QESK kernel can significantly outperform state-of-the-art graph kernels and graph deep learning methods for graph classification problems.
translated by 谷歌翻译
In this work, we propose a family of novel quantum kernels, namely the Hierarchical Aligned Quantum Jensen-Shannon Kernels (HAQJSK), for un-attributed graphs. Different from most existing classical graph kernels, the proposed HAQJSK kernels can incorporate hierarchical aligned structure information between graphs and transform graphs of random sizes into fixed-sized aligned graph structures, i.e., the Hierarchical Transitive Aligned Adjacency Matrix of vertices and the Hierarchical Transitive Aligned Density Matrix of the Continuous-Time Quantum Walk (CTQW). For a pair of graphs to hand, the resulting HAQJSK kernels are defined by measuring the Quantum Jensen-Shannon Divergence (QJSD) between their transitive aligned graph structures. We show that the proposed HAQJSK kernels not only reflect richer intrinsic global graph characteristics in terms of the CTQW, but also address the drawback of neglecting structural correspondence information arising in most existing R-convolution kernels. Furthermore, unlike the previous Quantum Jensen-Shannon Kernels associated with the QJSD and the CTQW, the proposed HAQJSK kernels can simultaneously guarantee the properties of permutation invariant and positive definiteness, explaining the theoretical advantages of the HAQJSK kernels. Experiments indicate the effectiveness of the proposed kernels.
translated by 谷歌翻译
There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entirety. While FairSC algorithm (Kleindessner et al., 2019) is able to find the fairer clustering, it is compromised by high costs due to the kernels of computing nullspaces and the square roots of dense matrices explicitly. We present a new formulation of underlying spectral computation by incorporating nullspace projection and Hotelling's deflation such that the resulting algorithm, called s-FairSC, only involves the sparse matrix-vector products and is able to fully exploit the sparsity of the fair SC model. The experimental results on the modified stochastic block model demonstrate that s-FairSC is comparable with FairSC in recovering fair clustering. Meanwhile, it is sped up by a factor of 12 for moderate model sizes. s-FairSC is further demonstrated to be scalable in the sense that the computational costs of s-FairSC only increase marginally compared to the SC without fairness constraints.
translated by 谷歌翻译
尽管基于3D点云表示的基于自我监督的对比度学习模型最近取得了成功,但此类预训练模型的对抗性鲁棒性引起了人们的关注。对抗性对比学习(ACL)被认为是改善预训练模型的鲁棒性的有效方法。相比之下,投影仪被认为是在对比度预处理过程中删除不必要的特征信息的有效组成部分,并且大多数ACL作品还使用对比度损失,与预测的功能表示形式相比损失,在预处理中产生对抗性示例,而“未转移”的功能表征用于发电的对抗性输入。在推理期间。由于投影和“未投影”功能之间的分布差距,其模型受到限制,以获取下游任务的可靠特征表示。我们介绍了一种新方法,通过利用虚拟对抗性损失在对比度学习框架中使用“未重新注射”功能表示,以生成高质量的3D对抗示例,以进行对抗训练。我们介绍了强大的意识损失功能,以对抗自我监督对比度学习框架。此外,我们发现选择具有正常操作员(DON)操作员差异的高差异作为对抗性自学对比度学习的附加输入,可以显着提高预训练模型的对抗性鲁棒性。我们在下游任务上验证我们的方法,包括3D分类和使用多个数据集的3D分割。它在最先进的对抗性学习方法上获得了可比的鲁棒精度。
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
在各种机器学习问题中,包括转移,多任务,连续和元学习在内,衡量不同任务之间的相似性至关重要。最新的测量任务相似性的方法依赖于体系结构:1)依靠预训练的模型,或2)在任务上进行培训网络,并将正向转移用作任务相似性的代理。在本文中,我们利用了最佳运输理论,并定义了一个新颖的任务嵌入监督分类,该分类是模型的,无训练的,并且能够处理(部分)脱节标签集。简而言之,给定带有地面标签的数据集,我们通过多维缩放和串联数据集样品进行嵌入标签,并具有相应的标签嵌入。然后,我们将两个数据集之间的距离定义为其更新样品之间的2-Wasserstein距离。最后,我们利用2-wasserstein嵌入框架将任务嵌入到矢量空间中,在该空间中,嵌入点之间的欧几里得距离近似于任务之间提出的2-wasserstein距离。我们表明,与最佳传输数据集距离(OTDD)等相关方法相比,所提出的嵌入导致任务的比较显着更快。此外,我们通过各种数值实验证明了我们提出的嵌入的有效性,并显示了我们所提出的距离与任务之间的前进和向后转移之间的统计学意义相关性。
translated by 谷歌翻译
预测药物目标相互作用是药物发现的关键。最近基于深度学习的方法显示出令人鼓舞的表现,但仍有两个挑战:(i)如何明确建模并学习药物与目标之间的局部互动,以更好地预测和解释; (ii)如何从不同分布的新型药物目标对上概括预测性能。在这项工作中,我们提出了Dugban,这是一个深层双线性注意网络(BAN)框架,并适应了域的适应性,以明确学习药物与目标之间的配对局部相互作用,并适应了分布数据外的数据。 Dugban在药物分子图和靶蛋白序列上进行预测的作品,有条件结构域对抗性学习,以使跨不同分布的学习相互作用表示,以更好地对新型药物目标对进行更好的概括。在内域和跨域设置下,在三个基准数据集上进行的实验表明,对于五个最先进的基准,Dugban取得了最佳的总体表现。此外,可视化学习的双线性注意图图提供了可解释的见解,从预测结果中提供了可解释的见解。
translated by 谷歌翻译
大多数现有场景文本检测器都集中于检测字符或单词,这些字符或单词仅由于缺少上下文信息而捕获部分文本消息。为了更好地理解场景中的文本,更需要检测上下文文本块(CTB),该文本块由一个或多个积分文本单元(例如,字符,单词或短语)组成,自然阅读顺序并传输某些完整的文本消息。本文介绍了上下文文本检测,这是一种检测CTB的新设置,以更好地理解场景中的文本。我们通过双重检测任务制定新设置,该任务首先检测积分文本单元,然后将其分组为CTB。为此,我们设计了一种新颖的场景文本群集技术,将整体文本单元视为令牌,并将它们(属于同一CTB)分组为有序的令牌序列。此外,我们创建了两个数据集Scut-ctw-context和rects-context,以促进未来的研究,其中每个CTB都由有序的积分文本单元很好地注释。此外,我们介绍了三个指标,这些指标以局部准确性,连续性和全球准确性来衡量上下文文本检测。广泛的实验表明,我们的方法准确地检测到CTB,这些CTB有效地促进了下游任务,例如文本分类和翻译。该项目可在https://sg-vilab.github.io/publication/xue20222contextual/上获得。
translated by 谷歌翻译